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ABSTRACT 

Fourteen novel medium reiteration frequency (MER) 
families were found, in the human genome, by using 
two different methods. Repetition frequencies per 
haploid human genome were estimated for each of 
these families as well as for six previously described 
MER DNA families. By these measurements, the 
families were found to contain variable numbers of 
elements, ranging from 200 to 10,000 copies per 
haploid human genome. 

INTRODUCTION 

The human genome, like those of other higher eukaryotes, 
contains a substantial amount of interspersed repetitive DNA 
sequence. Besides the three largest families, the Alu family, the 
LI family, and the THE repeats, there are a number of minor 
families, referred to as medium reiteration frequency repeats 
(MER). 

Medium frequency repetitive sequences were first observed 
as sequences which were present in naturally occurring SV40 
variants (1,2). With the recent surge of structural information 
about the human genome, more examples of MER families have 
been uncovered. Two quite different approaches have proven 
useful for identifying MER families. One method was a 
systematic computer analysis of the entire human DNA sequence 
database to detect repetitive elements which do not belong to 
known families (3). The second method was a sequence library 
construction technique which selected regions of human DNA 
lying between closely spaced Alu repeats (4). In this report, we 
use both approaches to discover fourteen novel MER families. 
We also provide measurements of the repetition frequency of each 
family. There are certainly more families which have yet escaped 
detection. 

EXPERIMENTAL METHODS 

Homology searches by computer assisted analysis of the 
human DNA database 

The GenBank DNA sequence database (Release 65) was subjected 
to a rapid search using the DASHER2 program as previously 
described (3) . 



Isolation of novel MER families from sequence libraries 

The Alu fragment library. Human genomic DNA (250 fig) was 
digested to completion with the restriction enzyme Alul. The 
resulting fragments were fractionated on a 6% polyacrylamide 
gel and fragments in the 500—1000 bp region were isolated. 
50 ng of this size fractionated DNA was ligated to 500 ng of 
Smal cleaved M13mpl9 RF DNA (5). The vector was 
phosphatased before use. The ligated DNA was mixed with 
competent JM109 bacteria (Stratagene Inc.) and 20,000 resulting 
transformants were plated at a density of 1 ,650 plaques per 10 cm 
petri dish. Duplicate filter replicates were prepared and were 
incubated separately with either of two radioactive oligonucleotide 
probes. One of these oligonucleotides was the 25 mer, GTGG- 
CTC A[C/T] [ A/GJCCTGTA ATCCC AGC A . This sequence is 
from the 5' consensus sequence for Alu repeats (bases # 12-36 
from Fig. 1, ref 6). The other oligonucleotide was the 33 mer 
GG[C/T]TGCAGTGAGC[C/T][A/G][T/A]GAT[C/T][A/G] 
[C/T][A/G]CCA[C/T]TGCACT. This sequence was from the 
3' region of the Alu family consensus sequence (bases # 
218-250 from Fig. 1, ref 6). Phage which hybridized to both 
probes were replated at lower density and subjected to a repeat 
screening with the same probes. Single stranded template DNA 
was extracted from 131 phage isolates in 2 ml cultures. 

The PCR library. One /tg of genomic human placental DNA was 
mixed with 20 pg each of two oligonucleotides. One 
oligonucleotide was the 31 mer GGGTCGAC AGTGAGCCG - 
AG ATCGCGCC ACTG . where the underlined region is an 
outward facing primer at the 3' boundary of the Alu family 
consensus sequence (bases 224 —246 from Fig. 1, ref 6). The 
second oligonucleotide was the 28 mer GGGGATCC TGGGA - 
TT AC AGGCGTG AGCC , where the underlined region is an 
outward facing primer at the 5' boundary of the Alu family 
consensus sequence (bases 33 — 14 from Fig. 1, ref 6). The 
reaction mixture was adjusted to a volume of 200/d, containing 
each dNTP at a final concentration of 300/iM, polymerase 
reaction buffer, and 40 units of Thermos aquaticus DNA 
polymerase (Amersham). The polymerase chain reaction (7) was 
performed for 25 cycles of 94° for 3 minutes, followed by 50° 
for 5 minutes, followed by 72° for 5 minutes. The reaction was 
men extracted twice with an equal volume of phenol, once with 
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an equal volume of CHC1 3 , and ethanol precipitated. The DNA 
was then dissolved in 50 jd of restriction enzyme buffer, followed 
by the addition of 25 units of Sail and 50 units of BamH 1 . After 
a 4 hour incubation at 37°, the reaction mixture was heated at 
65° for 5 minutes and DNA fragments were purified by two 
cycles of preparative electrophoresis on a 1% polyacrylamide 
gel. DNA fragments in the 300- 1000 bp region were isolated. 
40 ng of this size fractionated DNA was ligated to 100 ng of 
BamHX, Sail cleaved M13mpl9 RF DNA (from which the 12 bp 
Sall-BamHl insert had been removed by S-300 gel filtration 
[Pharmacia]). The ligated DNA was mixed with competent 
JM109 bacteria (Stratagene Inc.) and 2000 resulting transformants 
were plated at a density of 150 plaques per 10 cm petri dish on 
NZY plates containing 0-galactosidase indicator dye. Duplicate 
filter replicates were prepared and were incubated separately with 
either of two radioactive probes. One probe was a double stranded 
69 bp sequence taken from the internal region of an Alu repeat 
(CACTTTGGGAGGCCAAGGCAGGTGGATCACCTCAA- 
GTC AGG AGTTC AAGGCC AGCCTGACC AAC ATGG A) . The 
second probe was a 5 kb fragment containing an LI sequence 
from the region 5' to the human 7-gamma globin gene (8), 431 
phage plaques were selected by the criteria of not reacting with 
the dye indicator and not hybridizing to either of the radioactive 
probes. Single stranded template DNA was extracted from small 
scale (3 ml) cultures of these phage. 

DNA, sequencing, amplification, hybridization, and sequence 
analysis 

The single stranded templates were sequenced by the dideoxy 
termination method (8) as modified (9) using T7 DNA 
Polymerase (10). 400-600 bp of sequence information was 
obtained from each template. Oligonucleotides were synthesized 
by Research Genetics Inc (Huntsville AL) and labeled with 
polynucleotide kinase and 32 P labeled 7- ATP. Double stranded 
DNA was labeled by the random primer method (11). 
Hybridizations were performed at 60° in 6XSSC, 20 mM 
NaP0 4 , 10% dextran sulfate, 4xDenhart's solution, 0.1% 
SDS, and 100 jig/ml denatured salmon sperm DNA for 16 hrs. 
Washing was done twice for 5 minutes in 2xSSC, 0.2% SDS 
at room temperature, followed by two washes for 15 minutes 
in 0.2XSSC, 0.1% SDS at 45°. Autoradiography was 6 hrs to 
overnight with an intensifying screen at —70°. MER probes were 
constructed by synthesizing oligonucleotide primers suitable for 
Polymerase Chain Reaction amplification (7) of the M 13 sequence 
templates. The PCR products were then subcloned in Bluescript 
vectors (Stratagene Inc.). The sequence data were analysed on 
a Macintosh II microcomputer using Mac Vector 3.5 DNA 
analysis software from International Biotechnologies Inc. 
Southern blot hybridizations were performed according to 
Southern (12) as modified by Thomas (13). 

Reiteration Frequency of DNA probes 

Reiteration frequencies of the probes were estimated by a plaque 
hybridization assay. Purified double stranded DNA fragments 
used as probes were provided by Lagan Inc., Detroit MI. These 
DNA probes were labeled with 32 P by the random primer 
method and purified by centrifugal chromatography on Sephadex 
G-25 (15). The probes were then hybridized in situ to 
25,000 -50,000 recombinant lambda bacteriophage plaques from 
a genomic human DNA library which had been immobilized on 
a 150 mm circular nitrocellulose filter (14). The reiteration 
frequency of a particular probe was then estimated by the 



formula, j(# of positive plaques binding probe) x (2.5 x 10 6 )) 
-H (total # of plaques on the filter) x 15). In this formula the 
2.5 x 10 6 represents the size of the human genome in kb, while 
15 is the average size (in kb) of the human DNA inserted in the 
bacteriophage. As reflected by the numbers in Table I, this 
formula gives a statistical estimate, which may be in error by 
as much as 50%. 



RESULTS 

Sequence Analysis of novel MER families 

A total of 562 M13 templates were subjected to sequence analysis. 
131 of these templates came from the Alu fragment library and 
431 of the templates came from the PCR library. In both cases, 
the selection procedures were designed to identify DNA 
sequences 200 — 700 bp in length which are flanked by Alu 
repeats. 

The Alu fragment method was based on the fact that the bulk 
of Alu family elements in the human genome contain a conserved 
site for the restriction enzyme Alul (16). When human genomic 
DNA was cleaved with this enzyme a subset of the resulting 
fragments possessed the 3' portion of an Alu family repeat at 
one end, the 5' portion of an Alu family repeat at the other end, 
with an internal region of non-Alu family DNA in between. The 
Alul restriction fragments were cloned and the subset of DNA 
fragments described above was identified by the use of two 
oligonucleotide hybridization probes. One was specific for the 
5' end of an Alu family repeat, and the other was specific for 
the 3' end of an Alu family repeat. 

The PCR method used two oligonucleotide primers which faced 
outward from the 5' and 3' regions of a consensus Alu family 
repeat sequence. When these primers were incubated with human 
genomic DNA under PCR conditions, the regions between Alu 
repeats were selectively amplified. These amplified fragments 
were then cloned. 

Both these methods suffered limitations which precluded the 
isolation of certain sequences or included undesired sequences 

Table I. MER sequences. 
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Characteristics of 2 1 MER sequences are presented, including the source by which 
ihey were discovered, the number of their occurrences in GenBank 65. their 
repetition frequencies by the plaque hybridization assay, the numbers of their 
occurrences in both the Alu fragment and PCR libraries, and their hybridization 
signals to rodent and bovine chromosomal DNAs. DASHER2 is the computer 
program for rapid similarity searches. The repetition frequencies are given in 
copies per haploid human genome. MER3 was a loosely homologous sequence 
family detected by DASHER2 analysis, but a probe made from this fam :: -. did 
not detect any repetitive sequences by the plaque hybridization assay. 
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in the selection of clones. In some ways, the two methods were 
complementary in this regard. The main weakness of the PCR 
library stemmed from the fact that many Alu family elements 
exist as directly repeated tandem mul timers. Any such 
arrangement formed a short (30—40 bp) sequence which was 
an ideal candidate to be amplified during the polymerase chain 
reaction using the primers we employed. The polymerase chain 
reaction products were subjected to two cycles of preparative gel 
electrophoresis before cloning, to select inserts of greater than 
200 bp. Despite this, only 220 of 431 clones contained inserts 
of greater than 150 bp. Only 178 of these clones contained inserts 
of greater than 100 bp of non-Alu family sequence. 

The Alu fragment library was designed to contain pieces of 
Alu family repeats at the boundaries with non-Alu family DNA 
in between. However, 57 of the 131 clones isolated by the Alu 
fragment method contained otherwise intact Alu family repeats 
which did not possess the characteristic Alul restriction enyzme 
cleavage site. In most cases, the sequence data obtained from 
such clones consisted of Alu family sequence with little or no 
flanking regions. A further weakness of this method was that 
the clones were designed to terminate at all Alul restriction sites. 
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Figure 1. Hybridization of mammalian chromosomal DNAs with MER DNA 
probes. Five pg samples of restriction endonuclease digested chromosomal DNAs 
were eiectrophoresed in a 0.7% agarose gel and blotted onto nylon membranes. 
The membranes were incubated with radiolabeled DNA probes described in Table 
III or Ref. 3. Shown are autoradiographs of the blots. The lanes are: 1, Human 
DNA digested with EcoRl; 2, Rhesus monkey DNA digested with fcoRl; 3, 
Mouse DNA digested with £coRl ; 4, Chinese hamster DNA digested with EcoR 1 ; 
5, Bovine DNA digested with £coRl. Positions of molecular weight markers 
are shown by dashes at the right of each autoradiograph. 



Thus any region of the genome spanning an Alul restriction site 
was excluded from this library: 

Sequence data was obtained from all 562 clones selected from 
both libraries. This sequence data was then aligned against a 
data file containing other known primate repetitive elements, 
including LI sequences (17), satellite sequences (18), Xba repeats 
(19), O repeats (20), THE repeats (21), simple polydinucleotides 
and homopolymer runs greater than 12 bp. All regions 
homologous to these elements were removed. At this point the 
sequence datafile contained 58,272 bp, of which 22,285 bp were 
derived from the Alu fragment library and 35,987 bp were 
derived from the PCR library. This sequence was divided into 
blocks of 4-5 kb and aligned with the primate sequence database. 



Table II . Locations of MERs in known chromosomal loci. 
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The 56 MERs with known genomic locations are listed. All sequences use the 
GenBank numbering system. The column Alu sequences refers to the presence 
or absence of Alu family repeats with 500 bp 5' or 3' of the given genomic location 
and the arrows refer to the orientation of the Alu repeats. For example — means 
that there is no Alu sequence 5' to the genomic location and that there is an Alu 
3' to the genomic location which is oriented in the 5' to 3' direction. The letters 
'ins* refer to an insertion of an Alu element within the MER element. MER U 
is the MER designation of each repetitive element. Length in bp refers to the 
total length of the given genomic sequence in the database, kb per MER is obtained 
by dividing the length in kb of each loci by the numbers of MERs identified within 
selected loci. 
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Twenty seven sequences which showed matches to the primate 
database (greater than 150 on initial scoring or 300 on optimized 
scoring) were investigated further. Of these twenty seven 
sequences, nine were eliminated because they were members of 
previously described MER families. Double stranded DNA 
probes (100-300 bp) were constructed from the eighteen 
remaining sequences. Eleven of these probes identified repeat 
sequence families by the plaque hybridization assay (Materials 
& Methods). Probes were also constructed for ten additional 
MER families discovered by DASHER2 analysis of the GenBank 
65 database. The estimated reiteration frequencies of all these 
probes are given £Table I). 

The screening processes used here relied on the occurrence 
of homologous regions in the human sequence database. At least 
one region of homology was required with the sequence library 
methods and at least two regions of homology were required with 
the DASHER2 method. It was conceivable that some of the 
sequences in the Alu fragment or PCR libraries could be MER 
elements which are not represented in the human sequence 
database. To test this hypothesis probes were constructed for 
sixteen sequences from the libraries which did not have a 
significant match with the human primate database. None of these 
probes identified repeat sequence families in the plaque 
hybridization assay. 

Hybridization of MERs to chromosomal DNAs 
Genomic blot hybridizations were performed in order to evaluate 
the distribution of MER repeats in genomic DNA. Double 
stranded DNA probes for all MER families were radiolabelled 
and hybridized to EcoRl digests of primate, bovine, Chinese 
hamster, and mouse chromosomal DNAs. The resulting 
autoradiographs (twelve representative films are shown, Fig. 1) 
showed hybridization of these probes to disperse sets of restriction 
fragments from human and rhesus monkey chromosomal DNAs. 
Similar results have been reported for another MER1 probe (22). 
Some of the patterns showed what appeared to be discrete length 
bands within a background of heterogeneous length fragments. 
Further investigation is required to explain the origin of these 
discrete bands. None of the probes showed hybridization to mouse 
or Chinese hamster chromosomal DNAs. The bovine 
chromosomal DNA showed hybridization signals for ten of the 
probes. These signals were always less intense than those of from 
an equivalent amount of primate DNA, but contrasted to no 
detectable signal in the rodents' chromosomal DNAs. 



DISCUSSION 

Isolation of MERs from the sequence libraries 

The sequence library methods were designed to select for novel 
MERs on the basis of their proximity to Alu family repeats. Of 
the 56 known genomic MER locations, 26 lie within 500 bp of 
an Alu repeat (Table II). The Alu fragment and PCR libraries 
were constructed in such a way to select for regions of the genome 
adjacent to Alu repeats. Both of these sequence libraries were 
clearly, enriched for MERs. The 59 kb of sequence analysed 
contained 20 examples of MERs or one repeat for every 2.9 kb. 
This is in contrast to known genomic loci containing MERs. 
Several of the long known gene sequences contain multiple 
examples of MERs, but even the most richly endowed sequence 
(HUMTPA) has fewer MERs per unit length (5.2 kb per MER) 
than either the PCR or Alu fragment libraries (Table II). 



Description of MER families 

Sequence alignments are presented for fourteen families, MER8 
to MER21 (Table HI). MER1 to MER6 (3) and MER7 (4) 
sequence alignments have been previously presented. 

MER8. This sequence was present in the Alu fragment library. 
It is 60% homologous to a 160 bp region in the tissue plasminogen 
activator gene (23), and has a 70 bp homology to the tissue factor 
0 gene (24). 

MER9. This sequence was found by DASHER2 analysis of 
GenBank, as a 88% homology over 270 bp. The probe for this 
family was made from a sequence within intron e of the human 
gene for blood clotting factor IX (25). 

MERIO. This sequence was first noted as being repetitive by 
Lawrance et al. (26). It appears four times in the human sequence 
database, and different family members share about 80% 
homology. MER10 has also been studied by Mermer et al. (27) 
who referred to this family as the Mstll repeats. The probe for 
this family was constructed from an intergenic region in the 
human HLA locus (26). 

MER1L This sequence was found by DASHER2 analysis of 
GenBank. This is the longest MER found, showing matches over 
1 100 bp in three different genes. MER1 1 sequences exhibit length 
differences. This is due to the presence of variable numbers of 
a 50 bp subrepeat, which is present four times in the 
HUMP45C17 sequence (marked by underlines in Table I), three 
times in the HUMSIGMG3 sequence and once in the 
HUMGSTPIA sequence. Akahori et al. (28) noticed this 50 bp 
subrepeat in the HUMSIGMG3 sequence. Two probes for this 
family were constructed, one from the 5' region and another from 
the 3' region of the MER11 repeat in the human gene for 
cytochrome P450-C17 (29). Both probes indicated similar repeat 
frequencies in the plaque hybridization assay. 

MER12. This sequence was found in the PCR library. There are 
four appearances of this family in the primate database, each 
sharing 60 -70% homology with the others. 

MER13. This sequence was found by computer analysis of the 
database. A probe was constructed from an intergenic region of 
the macaque 0 globin gene region (30). The probe shared 93% 
homology with its orthologous counterpart in the human /3 globin 
region and 70— 75% homology to the other members of this 
family. The MER 13 sequence may be part of a variant LI 
sequence. It was considered to be the 3' portion an LI repeat 
in the human gene for blood clotting factor IX (25). It lies adjacent 
to, but was not considered part of, an L1H repeat in the human 
0-globin locus (31). MER 13 is also adjacent to an L1H repeat 
in the human crystallin gene locus (32) and in 5' flanking 
sequence of the human glutathione S-transferase pi gene (33). 
However MER 13 was not associated with LI repeats in its other 
two known genomic locations. 

MER14. This sequence was found in the PCR library. It shares 
75 — 80% homology with two sequences in the primate database. 
At one location MER14 flanks the 5' side of a deletion in a mutant 
gene for the alpha chain of ^-hexosaminidase A (34). The 3' 
side of this deletion is within an Alu family repeat. 
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Table III. Alignment of MER8-21 with their locations at sequenced loci. 
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1.1C.A -CACG 2..GCA: AG. . .TC C.G. . .A C GTCA. .CGG1 CA. AACG. AA. C. 

CCA. .CC..A..G..A..GC.A...C...A.....1...G..GAG...-..2 C -.G. .-. . .T. .CA.GTC2A. A TG. .C.G. 

CATCTGCACTAATAGAGTGAGAACTCACTCATTACCCGGA^ 

. .AAGTA. . .--CG. . .CA. . .GT G AA. . .T.T. .CC CA -A A.CCCA.G.TCC T. .C . .T.T C A. . .T 

..CA..G...TCCT.G.CA G C . .AA. 1 . .T. . .G.T. .A C . .2 .TG. .CTTG C . . TCG T T.....C TCCA.A. . . 

..CAG.G CT...G G C . 1G T..TG.T.G 1...1.ACC CT. .TC N.-NNNN C .K. . . 

AATTTTCAACATGAGACTTGGAACTGACAAATATCCAAATCATATT> 9026 

.CAA G.-...T — .-C.-.G.-C C > 411 

.CA...T T..AAGCC..A..GC..T.C.CT....< 450 

.CA.C A.T -..A CC -.- > 501 



8740 
129 
744 
216 

8860 
256 
619 
336 

8980 
373 
496 
458 



HUMP45C17 
HUMSIGMG3 

hump 4 sen 

HUMSIGMG3 

HUMP45C17 
HUMSIGMS3 

HUMP45C17 
HUMSIGMG3 
HUMGSTPIA 

HUMP45C17 
HUMSIGMS3 
HUMGSTPIA 

HUMP 4 5C 17 
HUMSIGMG3 
HUMGSTPIA 

HUMP 4 SO "7 
HUMSIGMG3 
HUMGSTPIA 

HUMP 4 SCI 7 
HUMSIGMG3 
HUMGSTPIA 

HUMP45C17 
HUMSIGMG3 
HUMGSTPIA 

HUMP45C17 
KUMSIGMG3 
HUM3STPIA 



GACCAAGCCTTCATCCCGACGGCACACTTATATAAAAAAGAAA^ 

. .T.-T.CT.AT. . .A. 1..C G l.A 



AGATTTCATTTAATATGGACATTTATCAGTTCCCAAATAATACTm 

A. . 



TTGTAAGCTGAGGATGTTTGTCACT 
.CA. AC . . . .C 

TCAGCACCACTGTGATAATTGTCTTAACTG^ 

G A. 2 C -. ..G C.G T.. . 

CGAAAACAACCATAAGGTCTGACTCCCTGCAGGGTCGGCCAGAA 

G AG T A T 1.. .A..C ..A. .G G A. . . 

G A..C..C 



AGCAAGGAGTATTATTATTAATACCCTGGGAAAGGAATGCAT 

A A..C C-. 

.T A..1A..A 1C A A CA. 



ATGTCTATC TTGTGCAGTTGAGATAAGGAjCTGAGATACGCCCT 
...G A. .TG 9. ..3. 



:aaggtgtttatcaa 

G. .T .A C .T C .-. .CTC . . . . T A. 

. .C C . .A. . .GA A.TA.~ G. . .T. . .AG C .-A. .CA CT. . .CCC .ACA.TGCG. .CACGT.G.ACCTGG. . 



GGTCTCCTGCAGTACCCTGAGGCTTACL 

.- T 

A 



r-M^TATCTtgATCGCTCaU-t AT^^ 

..T.A...A T T. ..A T 

TT. .T.-A...ATT. 



120 
470 

240 
583 

360 
704 

480 
825 
13 

600 
956 
137 

720 
1074 
2S4 

840 
1188 
269 



PArJUkrcATCraTCTTTG TTAr^CACTTATTAGra 960 

C T.- 1263 

2T. .CCCT. .-C . .3CCT.-T.A. .-TT. .G -C T. : C 



328 



I GTACCCTACTCTXCXTGrrCTTACACCCCC 

! !ga!--c.a!a! Y.Y... !g!g! ! !t! 



. .GT...1T. 



CTTAATAAAAACTTGCTC<;TTTGAGGCTCAGGTGGGTATCACAGTCCTA^ 1080 

1A C C.TC T..... T T 1380 

.GA...TTGC C IC -G...C 1AA. . .G.G. .C T.T....T..AC 446 

ACGCTACTCCTGCj 1200 

.T. ..CA.. .A; ..7 ..A CA A. . 1AA.TA > 1506 

.1G. . .A. .C 7A.1G.; .. .G...C.-G 1C .AA. .T C.ATAATA. ..> 571 



MXR12 .■—■T 

MER12A1 I CAGTGGCACCCAGCCCCATTTTAACCATTrTTAAGTGGACAGT^^ * * *TATC_- 

HUMBCR22I . A. AT.TA.TG G TG.G G. .GT. . .C -.-T. .A A.CA.A.ACTG. .A.T. .TG C. .A 

HUMTPA <.. .CC C.-.G..C C G. .G C .GT. .T.- TG. . -C AC.G. . .G. .A GC 

HUMHPRTB . .C.G.GT. .CG . ACC - . — .A. A CTTT.A TG .C.AGTGGT. .T. . AT. TAT . . AGAGTTG.G 1 .T.CC 

HUMCYP345 . .CCG. .TCGA. .TCCCAA. .GCGGCA.-T G.GTGA.C. .CGGT.CCGG.C . — .-A. . .AA.T.TAG.A. . .TTT. .T. .C .C.-J 



120 
1041 

CCC... AC 17668 
A...C.-.. 37159 
. . .ACCCGC-.CC 2932 
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MER12A1 GTGAAACTACGACTCTATACTXXTTTAACAATAACTCTCCATTCTACCCTCTC 240 

HUMBCR22I C. .C .-. . .-CA 1CA. . .GC -..-.C..C. . .1. .GTC A. . .CA CC. .T C G C G.A > 1155 

HUKTPA CCCT GAA G. .ACTA. . A. .G. . - - . . T . T . . . - . -T . AG . - . A . C. . — G.T.G C.G...G..CC GC CA...A..C..< 17555 

HUMHPRTB -. ,'G. . . — .C-AACA. .-.T.G. .CC. .T.GCAGT CT - G T.A 7..CCC.CG TC . . .A.GGAC.TTC . . ..> 37280 

HUMCYP345 A.-T. G. — . .CC.C.A.A. . .CC.CC. .C-. .G.-C .CC.TGC. . .AC. . .- G G.T.A GG CTC AC.TT... . .> 3044 



KXR13 

MACHBB 

KUHHB8 

HUMGSTPIA 

HUMFIXG 

HUMCRYGBC 



HUMINSRD 

MACHBB 

HLMHBB 

HUMGSTPIA 

HUMFIXG 

HUMCRYGBC 

HLMHBB 

HUMINSRD 



GGGAGCTAAATCATGATt^CATGGACACA^ 300 

T CC C C A T 58297 

!"!!.] G.-. .T1TA C...GA.A CTGT . . .TG . . .CC GC...A G. .A GC.G 1G.A CCCA 

G..T.A1A..TT..AA GA.A T1..C- AG T C GC.G 1 .CA T G. .C. 

C CTT...-GT TGG. .- A. A. . .G CTGG. .CG T.GC CG.TT C C G.CA. . 

A. A 1GA T..T.. .G.GA.A GC.T.CTGG. . C . . . TCA G. . . 

<..C.GCAT.TCTT.-T..C.GTCT...T.C. . .-.T.-.C. . .-.-GT.C A..G.T. 



.A-G. . .A 1GG A.CA. . 

.A A C CG 



,A. . . A.G 
. .CA 



. . TAC . 

CG 

. .T.C 



TATTAGCTGAGT GATGAAATAATCTGTACATC 410 

c c , ' \ c T C - T > 58405 

'.A..C.'.\g.Y... . .. .V.T.. A CA. .GAA C.-.G T. . .G. . 4CC.TC AC AG T -. .< 606 

G.A..T.. .C. .C. . .T A A C. ,.T -CATC T G..4CC.TC T. . .CC .3T.A A A.< 1B189 

...CC. .G A C CTTCACA. . . 1 5. .C-. .CC.G.T.C.G T C TT...T...> 21213 

..A..C C.C..A. ...A C.CT.... ..T-C...T....C7TC6 C TG. .C IT -C . .A> 15072 

G. ...CA..G CA T G...1C ACA. . .CC T. . .T.T. - . . T . . TT.T . < 1036 



714 
18305 
21101 
14952 

mi 



MXR14 

MER14A1 

HUMHXAMUT 

HUMZL2RBA 

MER14A1 

HUMHXAMUT 

HUMIL2RBA 



I CAGCCGCGCCCG GCCTATATATATATTTCTGGATGACGTGAGATATTTTGATACAGGCATGCA^ 120 

..-.-..C G T C . . . GTGA G T. .AG....G.1.A A. . .T 89 

.G.C CA..AGGC.A C A GT "-..A A. .T. . iGT 211 



TTCTGTTACAAATGATCCAGTCTACTCTTGTAG^ 153 

GA .T...CA -.. .A.-.C TTTAAAATATACAATTATTATTGACTATAGTCACCCTGGAGTAGCTA> 

. .4 - — C.A...A T. . 6.T G. .4 G A TTA. < 



165 
122 



MZR13 

MER15A1 

MER15B1 

HUMSIGMG3 

HUMC21DLA 

HUMMBP1A 

HUMHPRTB 

HUMINT2 

MER1SA1 

MER15B1 

KUMSIGMG3 

HUMC21DLA 

HUMMBP1A 

HUMHPRTB 

HUMINT2 



t C AGC TG TAC TGGGC TG ATAAGGGCCAATTC TAATGGGCTCATC TTAAC TTS 
I r^CCACCCTACTCCAGTACGACCTCATCTGGGTTAAT 

CCATCTTCACAGGC^GTTCTCCA PACfiTr^TCTC . . .TTG GGT -G 

T CT...A T.CTCT.2AT....T..T...A...T.T TG.A.A.G. .A C 1G . . . .T. . .T. T. -T -CA 

CCT1CCC .GTG. .TC .T.G T. . .Al .G.T.T. . .CT.CA. . . . . .C — G GT G. , ,A.T -. ..C 

< T. . -C . . . -G. -TT . . . T TG.TA...T. .TT. X.T. .A T.C. . . . „C T. .A.T A. — . T .AT. AA T.2A 

<...TCCTG CA. ..1 GT..TT TA.GTT -...CCA. 



ATGACATCCACAAAGTCTCTGTTTCCAAATAAG^^ 



--TCCTCGTGTGATGGCCCTGTTTCCAAAT 

..TG..G.TG —A C TGT.l.CT . . . A. . .C . . AC 2A. CC .-T.TA. ..< 2009 

,CT TC..T.AAC. .A T TG. 1T.CTA.A AA. .CC. ..A CA.A.TG-A.G. .G.A.> 

..TC.CTG -A... A C T.CA.AA A. A. .C . A. . . -C - . 

C .TTT A.-. .A.C.T CA T. . .TG. . .T1CAT C . .A. .CC . .A. .G 

T..-....TGT..T.A.C.-A G.TG..TT1TA A. . ,C . AG. . . . .CC 



. .A.GG. ..CT..> 
.A.AGG.CCA. .CAT. 
-.GG.-. .CA. .< 



1812 
380 
54772 
11456 



51 
38 
2106 
1718 
291 
54870 
11548 

172 
156 



j CAGCTGCATCTGGCCTAAATCCTAATGTAAACTATGAATTTT 42 

AGGGCAGTCAAGCTATTCCATATGATACCACAATGGCGGATACACGTCAGTATACAT^ . ..CAA.AG.G..CT.A..CT GGC . 485S 

..A TGC...G.G.TGT T T.A..TGT. .TG T 1. ..-..-GA. .-.AA.AG.G. .C.T 9AA. . .G.C . . 424 

MER16A1 GGGTGATAATGATGTGTAJUKTATAGGTTCATCAATTGTAACAAAGGTACTACTTTGGTTC 162 

HUMATP1A2 . .AC A A.C . .GC -G CG T...AC..C.A.G.1.G CG G — . . .-GG. .GC 1 1 .AACT 4974 

HUMLMWOS1 T.T.A..T A.T CAA T. . .-.-.-ACA. . ~.-C A. . .A. — .-G. . .AA. . -T. .-.- T .G.-. . . .G. 1 . . .CT. . .-. .AAC . 530 

MER16A1 TCTGl 166 

HUMATP1A2 . . . -CTACTTCCCACTCAATTTTGCTGTGAACCAGAAAATTGCTCCA> 5020 
HUMLMWOS1 -..A.T. AG A. .T A.C.TA...O 569 



KXR17 . 

MER17A1 I TCCTAGGCCTTCGCATTTACTCACCACTCACT^ 102 
HUMCRYGBC A C.T..2.TC CC 1A GT A..> 9184 



MER16A1 

HUMSEXREPB 

HUMHPRTB 

HUMHPRTB 

HUMPADP 

HUMC210LA 

MER18A1 

HUMSEXREPB 

HUMHPRTB 

HUMHPRTB 

HUMPADP 

HUMC21DLA 

MER18A1 

HUMSEXREPB 

HUMHPRTB 

HUMHPRTB 

HUMPADP 

HUMC21DL* 



I CAGCTGTGCCCAGCCTGTTTTTAGTCTTATAGTGGCATTCTGGGGCTC 120 

..T.A.CT GGGA.G.CA. .G.-.GC .ATC . . . .AT. . . -AC. . ,C .-. .CGA. X. . .-GT.A. .G. . .C .AGA.T.GTTAC .A. . .A.C . .A 394 

..-T. ..- T.A..-.-. . .A.T.GG. .TT. .ATG. .-. . . A . TA . C . ACACCTTC. .A.AGAACAAC .CCG.TC C. .A A. .A 7966 

<. . .C-. . — .G. . .A T..1..A.G - GTG. .A 55098 

,C. .-T.CT.T. . .G G2 CTGT. .CCA.AAATA. . ACCAAAA . C. A . — CG . .A. . .CTG.GCCA . ATGA.CCC.G . - . . .G.C .A — T T 214 

. .-A. . .TT.TG TT.G. .-.-.G. .C...T.3. . .G. T . . T . 3A. A. .T.A. .T2 . . 1 .C. A . TAA.CG . .A. . -T.A. A. A. . TTCT. - A. . .T 1404 



AGTTACTTAAACAACAGAAACTTA rnTlG TACAATTCTGC^ 240 

G.CCC — . .CGA. .C-C . .G.C. .T. . .G. .A 1G CATC CCA. .GC.G. . .C. . .CT . . . 4 G CC 516 

. . .C. . .— — C-C . .G. .T G.l. 1 . . . A. . T 1C. . .CA. .-.AG 2 -. . .C . 3T — CAA 7850 

C, .3G .'.TG. . — ,T. .G CTC . ,G C A 1 A. .CA. .G. .G TCT. . .4 C 1 . . .G. — .G. . 54971 

GA.GGA.C C TG CCTC ... T ... A G 1G.A. . .G T...CT.AG. .G. . .A. .TCT.. .4 C...1. .CAC . .CC 88 

G.AGG G T GGCCTA. . .GC GTCA 1 G GT.A. CCA. . - . .G. . .CT . . . A. . — G. . -AGG.G. A, .- TCC .C 1519 



CTGTGTCTACACAATGCCTTCCTCTGTGCATGTCTGTGA 1 

CT....GG1TCC TC > 

GCT. . .-. . .A.C .T. ..AACA.GC. ..C..< 

T.C.CC. .-.TCT. .TCGT. .G < 

A. TCT TCT. .1 TCTA < 

. .CC .-T.-GC-T. .GGTGG.T C> 



279 
556 
7812 
54945 
49 
1548 
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MZR19 

MER19A1 
HUMRASSK2 

MER19A1 
HUMRASSK2 



1 TCATTAGATGGTCTXXACTCAGATTAAGGGT^ 

. .C.C.G.GTTG. . .-. .GG. . . .CCA. . GAGC . G . G . 1G0GG GGG.C.G.ATC.CC. . .T C A. .9.2. .AG.C.T. 

CTTCAATCTAATAAAGTTGACAGTATTAACCATCACACCTGCCTAASTAAAGATTAAA 1 205 

■ G..cT.:lT?T?- - AC.-...-T..C..CT.G.3C....-.C.A.T.C..-..C....< 541 



120 
623 



KXR20 

KER20A1 
HUMRPS17A 

KER20A1 
HUMRPS17A 

KER20A1 
HUMRPS17A 



f CTTCT AAGCAGGGCTTTCAAAAGGCAGCATTTGCATCTGCTATGTTAA TGGAGACATTTC 120 
T.A. . .-AT. .CT. . .A.CC.G. . .T.GTCC. .G-AAA. .A.GTGGC . . .G.C-T. .T.T.AAA. .-G.C.-G.G-T— TTT.C-TTT G G. .CA. .C 1> 3277 



TCTT GTCCTAACTGGGAGGTCCACAGACATACATTGGAT^ 240 
, C...C..A..C.CA..G.2C.TG.TA. i T.CT.A...G....G....A.G...T.......CTT.C....A.A T TG.-C— TA-T.-A. . . .G.A.G.CCA GG.> 3394 



TGTCACTAGTGTCAAGGTGGAGAAACTCTCGTCTAGACCTO 323 
TACCT. .C A.T GG.GT.T. . .-AG-TGAT.G.T. .G> 3443 



M£R21A1 I AATTCTGACACTGTCTACCTGGAGATAGCATCAAATCCCAGGGTTGAGGTC 12° 
IScyAROI .G...CA A... A.G. .TG. .TG 1A ...!....G A. .TT.~.GA.-.TA.A.ACG.-. . . -GTC.G.A.GA. .CC.AG.A.A. ..C....C.T. .TG 985 



MZ&21 

HER21A 
HUMCYAR01 

KCR21A1 CTGGCTTCACATTGGGGTAACCAGACTTCGTATTTATGTTCA^ 240 
HUMCYAR01 A.A.GA.T.-.G.-TCT.CTT..-C....C.G... -C.CC.CG. . - .TC. . -CC. . .CAG CCT.-. .CC.-.G.C. .TTGGGCC .T.T.T. . .G. . .1. .TTG.l.CC 875 

MER21A1 ATTCCTTGCACTTTCAATTCAI C Tt» T I IC T T TCAGGCACATCTTTATAGAA 1 291 

KUM3YAR01 . .GA. .GAAG.A.GTG. . .G.A.AAAA.G.GATT. .-. . . < 836 

The best match alignments from the MacVector programs are shown. Matches are indicated by periods, while mismatches are indicated by the letters G.A.T.C 
or dashes. Insertions are indicated by numbers showing their lengths. The boxed sequences indicate the regions used as probes in hybridization experiments. Nomenclature 
was adopted to allow future identification of additional members or subfamilies of MER repeats. For example MER8A 1 indicates example number 1 of the A subfamily 
of MER8. The new sequences presented here have been assigned the following EMBL accession numbers: (MER12A1; X59017), (MER14A1; X59018), (MER15A1; 
X59019), 1MER16A1; X59020J, (MER15B1; X59021), (MER17A1; X59022), |MER19A1; X59023J, |MER18A1; X59024), [MER21A1; X59025], (MER20A1; 
X59026J. Arrows (<or>) indicate the 5' to 3' directions of the sequence numbe rings. For MER 15 the best consensus sequence is a mosaic of sequences and is 
indicated by dotted underlines. For MER 11, underlines denote the 50 bp subrepeat described in DISCUSSION. 



MER15. This sequence was noted by Akahori et al. (28) as being 
part of a repeat unit in the human immunoglobulin locus. Two 
sequences, MER15A1 and MER15B1, were found in the PCR 
library which shared homology with this repeat unit, which was 
called the MER 15 family. Although MER15A1 and MER15B1 
share only short patches of homology with each other, together 
they form a mosaic consensus sequence. Searches of the database 
identified five sequences which shared 75-80% homology with 
this consensus. MER 15 sequences are found directly adjacent 
to MER18 sequences at two locations (HUMHPRTB 54970 and 
HUMC21DLA 1560). However, the two sequences are not 
juxtaposed at other genomic locations and hybridization 
experiments indicate different repetition frequencies, so MER 15 
and MER 18 are considered to be distinct MER families. 

MER16. This sequence was found in the PCR library. It shares 
70% homology with two human sequences from the database. 

MER17. This sequence was found in the Alu fragment library. 
Although short, it shares 87% homology with an intergenic 
sequence in the human crystallin gene locus (32). 

MER18. This sequence was found in the PCR library. It is a 
member of a repeat family first described by Fisher et al. (35). 
This family is rather loosely conserved (60—65% homology) and 
is spread throughout the chromosomes. A more tightly conserved 
(93 % homology) subfamily is embedded in tandem duplications 
on the human sex chromosomes (35). MER 18 sequences are 
found directly adjacent to MER 15 sequences at two locations 
(HUMHPRTB 54970 and HUMC21DLA 1560), but is still 
considered as a distinct MER family. 

MERJ 9. This sequence was found in the Alu fragment library. 
It contains a 100 bp region which is 85% homologous to a region 
5' to the human SK2 c-Ha-ras-1 oncogene (36). 



MER20. This sequence was found in the Alu fragment library. 
Hybridization experiments indicate a low (200 -400) copy 
number per haploid genome. One of these hybridization targets 
may be a 60% homology to the first intron of the human gene 
for ribosomal protein S17 (37). 

MEWL This sequence was found in the Alu fragment library. 
As for MER20, hybridization experiments indicate a repetitive 
sequence, but database searches revealed only a 60% homology 
to a 100 bp sequence in the 5' flank of the human aromatase 
cytochrome P450 gene (38). Such a homology may not be 
sufficient to have been detected in our hybridization experiments. 

Undiscovered MER families 

The 21 known MER families account for 30,000 -60,000 
repetitive elements in the human genome (Table I). There are 
surely more MER families remaining to be discovered. Their 
number can be estimated in several ways. The simplest way is 
to assume that the sequence libraries constructed in this study 
select MERs at random. Then, out of all the MERs in the 
libraries, the proportion which are members of already known 
MER families will correspond to the proportion of the MERs 
in the human genome which have already been assigned to known 
MER families. Specifically, the sequence libraries contained 20 
matches to the GenBank database. Of these 20, 9 were members 
of known MER families. This implies that 9/20 or 45% of the 
MERs are in known families and that 55% remain to be 
discovered. This implies the existence of 70,000— 140,000 MERs 
in the human genome. 

Another way to estimate the abundance of MERs is to 
extrapolate from the abundance of known MERs within 
sequenced gene loci. This estimate would have to be a lower 
bound, because it neglects the not yet discovered MERs which 
may lie within these loci. Nonetheless, some rough figures can 
be assembled. Five gene loci (demarcated by boxes in Table H) 



4738 Nucleic Acids Research, Vol. 19, No. 17 



contain 227494 bp of DNA and 22 MERs. This extrapolates to 
242,000 MERs in the entire human genome. 

Although both of these estimates are crude, they give some 
notion of the number and complexity of MER families in the 
human genome. More work is required in many areas before 
the nature and function of MER families will be understood. Even 
at this early stage several interesting features stand out. One is 
the apparent clustering of MER repeats at certain genetic loci 
(Table II). This clustering may imply that MER families influence 
gene expression, although one experimental test of this idea failed 
to support it (22). Another interesting feature is the diversity in 
levels of homology between the different MER families, ranging 
from >90% (MER9, MER11) to <70% (MER 12, MER 18). 
This implies that some of the MER families are of recent 
evolutionary origin while others are ancient. Some of the MER 
families may be old enough to be relics of Alu-type repeats from 
the early times of the mammalian radiation. This latter 
interpretation is supported by the observation that ten of the MER 
probes showed hybridization signals with bovine chromosomal 
DNAs (Fig. 1). 

Medium reiteration frequency, or MER repeats, are significant 
additions to the repertoire of interspersed repetitive DNA which 
has heretofore been described in the human genome. As the DNA 
databases grow in size, more MER families will be discovered. 
These families merit further study both because of their intrinsic 
importance as well as for their use as mapping sites for developing 
a physical map of the human genome. 
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